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1.1 

Introduction 

Interest in statistical network analysis has grown massively in recent decades 
and its perspective and methods are now widely used in many scientific ar¬ 
eas which involve the study of various types of networks for representing 
structure in many complex relational systems such as social relationships, in¬ 
formation flows, protein interactions, etc. 

Social network analysis is based on the study of social relations between 
actors so as to understand the formation of social structures by the analysis 
of basic local relations. Statistical models have started to play an increasingly 
important role because they give the possibility to explain the complexity of 
social behaviour and to investigate issues on how the global features of an 
observed network may be related to local network structures. The observed 
network is assumed to be generated by local social processes which depend 
on the self-organising dyadic relations between actors. The crucial challenge 
for statistical models in social network theory is to capture and describe the 
dependency giving rise to network global topology allowing inference about 
whether certain local structures are more common than expected. 

Unfortunately the computational burden required to estimate these mod¬ 
els is the main barrier to estimation. Recent theoretical developments and 
advances in approximate procedures have given the possibility to make im¬ 
portant progress to overcome statistical inference problems. 
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In this chapter we review some of the most recent computational advances 
in the rapidly expanding field of statistical social network analysis (see [26] 
for a recent review) using the R open-source software. 

In particular we will focus on Bayesian estimation for two important fami¬ 
lies of models: exponential random graph models (ERGMs) and latent space 
models (LSMs) and we will provide the R code used to produce the results 
obtained in this chapter. 

The chapter is organised as follows: in Section 1.2, we introduce the basic 
notation for social network analysis. In Section 1.3, we highlight the basic sta¬ 
tistical work on social networks citing recent references to enable interested 
readers to learn more. In particular, our interest lies on describing exponen¬ 
tial random graph models and latent space models. In Section 1.4, we discuss 
Bayesian analysis for these two families of models and computational meth¬ 
ods on a well-known dataset using the R software. Predictive goodness-of-fit 
diagnostics are also described at the end of the section. We conclude in Sec¬ 
tion 1.6 with a discussion of some future challenges. 


1.2 

Social networks as random graphs 

Networks are relational data that can be defined as a collection of nodes inter¬ 
acting with each other and connected in a pairwise fashion. In typical applica¬ 
tions, the nodes represent a set actors of various kind (people, organisations, 
countries, etc.) and the set edges represent a specific relationship between 
them (friendship, collaboration, etc.). 

From a statistical point of view, networks are relational data represented 
as mathematical graphs. A graph consists of a set of n nodes and a set of m 
edges which define some sort of relationships between pair of nodes called 
dyads. 

The connectivity pattern of a graph can be described by an n x n adjacency 
matrix y encoding the presence or absence of an edge between node i and j: 

[If if (bf) are connected, 

\ 0, otherwise. 

Two nodes are adjacent or neighbours if there is an edge between them. If 
y L j = yjj,\/i,j then the adjacency matrix is symmetric and the graph is undi¬ 
rected, otherwise the graph is directed and it is often called digraph. Edges 



connecting a node to itself (self-loops) are generally not allowed in many ap¬ 
plications and will not be considered in this context. The nature of the edges 
between nodes can take a range of values indicating the strength, frequency, 
intensity, etc. of the relation between a dyad. In this paper we consider only 
binary networks. According to the generally used notation, y will be used to 
indicate both a random graph and its adjacency matrix. 


1.3 

Statistical modelling approaches to social network analysis 

Many probability models have been proposed in order to summarise the con¬ 
nectivity structure of social networks by utilising their network statistics. 

The family of exponential random graph models (ERGMs) is a generalisa¬ 
tion of various models which take different assumptions into account: the 
Bernoulli random graph model [5] in which edges are considered indepen¬ 
dent of each other; the pi model [12] where dyads are assumed independent, 
and its random effects variant the f >2 model [30]; and the Markov random 
graph model [6] where each pair of edges is conditionally dependent given 
the rest of the graph. The family of latent space models has been proposed 
by [11] under the assumption that each node of the graph has a unknown 
position in a latent space and the probability of the edges are functions of 
those positions and node covariates. The latent position cluster model of [10] 
represents a further extension of this approach that takes account of cluster¬ 
ing. Other latent variable modelling approaches are represented by stochas¬ 
tic blockmodels [22] that involve block model structures whereby network 
nodes are partitioned into latent classes and the presence of any relationship 
between them depends on their block membership. 

1 . 3.1 

Exponential random graph models (ERGMs) 

Introduced by [12] to model individual heterogeneity of nodes and reciprocity 
of their edges, the family of exponential random graph models (ERGMs) was 
generalised by [6], [31] and [29]. ERGMs constitute a broad class of network 
models (see [25] for an introduction) that assume that the topological struc¬ 
ture of an observed network y can be explained in terms of the relative preva¬ 
lence of a set of overlapping subgraph configurations s(y) called network 
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statistics: 


( 1 . 1 ) 


exp{0 f s(y)} 

pm = Fl z(t y 

This equation states that the probability that an observed network y given the 
set of parameters 6 is equal to the exponent of an observed vector of network 
statistics s(y) multiplied by its associated vector of unknown parameters 6 
divided by a normalising constant z(8) to make all probabilities sum to one. 
The latter is calculated as the sum over all possible network configurations 
on the same set of n nodes of the observed network. In practice z(6) is com¬ 
putationally infeasible to calculate for non trivially-small networks. 

1.3.2 

Latent space models (LSMs) 

Latent space models were introduced by [11] under the basic assumption that 
each node has an unknown position z, in a d-dimensional Euclidean latent 
space. Network edges are assumed to be conditionally independent given the 
latent positions, and the probability of an edge between nodes is modelled 
as a function of their positions. Generally, in these models the smaller the 
distance between two nodes in the latent space, the greater their probability 
of being connected. The likelihood function of latent space models can be 
written as follows: 


N 

p(y|z,a) 

¥i 


exp (a — djjY'i 
1 + exp(« — djj) 


The standard metric is the Euclidean distance (ED in Table 1.3) and is de¬ 
fined as: d,j = |zj — Zj |. As an alternative the squared Euclidean distance 
(SED in Table 1.3) is defined as: dy — |z, — z ; | 2 and has been proposed by [8] 
for computational reasons (see 1.5.2). The latent positions are assumed to be 
Normally distributed, or having a Gaussian mixture model structure in case 
of the latent position cluster models (LPCMs), a generalisation of latent space 
models where latent clusters are assumed to be useful to explain data het¬ 
erogeneity. For strongly asymmetric graph, it is suggested to use the bilinear 
latent model setting dy = Z-Zy so that the probability of a link depends on the 
angle between two actors. This model is available in the latentnet pack¬ 
age through the bilinear argument included in the ergmm function. All the 
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presented latent space network models can be extended to incorporate covari¬ 
ate informations Xy introducing a parameter /), or the degree heterogeneity in 
sending or receiving links, these parameters are called sender and receiver if 
the network is directed, or sociality if the network is undirected [18]. 


1.4 

Bayesian inference for social network models 

The Bayesian approach to statistical problems is probabilistic. Inference is 
based on the posterior distribution which is the conditional probability of the 
unknown quantities Q given the data y. The posterior distribution extracts 
the information in the data and provide a complete summary of the uncer¬ 
tainty about the unknowns via Bayes' theorem: 

^ p(y|n) p{ n) 

p(y) ( j 

Bayesian analysis is able to give us a full probabilistic framework of uncer¬ 
tainty and this is something which is essential in the context of complex sta¬ 
tistical modelling. Moreover recent research in social network analysis has 
demonstrated the advantages and effectiveness of probabilistic Bayesian ap¬ 
proaches to relational data. In this chapter we will focus on parameter infer¬ 
ence so the uncertainties Q will refer to the ERGM parameters 9 or the LSM 
parameters a and latent positions z. 

1 . 4.1 

R-based software tools 

Applied researchers interested in Bayesian statistics are increasingly attracted 
to R [23] because of the ease of which one can code algorithms to sample 
from posterior distributions as well as the significant number of packages 
contributed to the Comprehensive R Archive Network (CRAN) that provide 
tools for Bayesian inference. 

R represents a useful tool for social network analysis with many advan¬ 
tages over traditional software packages. With a little coding and patience, 
one can produce ad hoc analyses and visualisations for the problem under 
study. Moreover R has a huge set of statistical libraries so that end users can 
complement their social network analysis research with any analysis of your 
choosing within R environment. 




In this section we briefly review Bayesian tools for ERGMs and LSMs: 

• The Bergm package (version 3.0.1) [3] implements Bayesian analysis for 
ERGMs using the methods proposed by [1,2,4]- The package provides a 
comprehensive framework for Bayesian inference and model selection 
using Markov chain Monte Carlo (MCMC) algorithms. 

• The latentnet package (version 2.5.1) [16,17], which is part of the 
statnet suite of packages [9], provides comprehensive toolsets for 
Bayesian analysis for latent position and cluster network models using 
MCMC procedures. 

• The VBLPCM package (version 2.4.3) [27] contains a collection of func¬ 
tions implementing variational Bayesian Inference for the latent posi¬ 
tion cluster model. 

• The 1 vm4 net package (version 0.2) [7] contains a collection of functions 
implementing fast variational Bayesian inference for latent space mod¬ 
els. 

Other R implementations of Bayesian methods for statistical social network 
models include: RSiena [24] implementing stochastic actor-based models; 
hergm [28] implementing hierarchical ERGMs with local dependence; sna 
(belonging to the statnet suite of packages) generating posterior samples 
from Butt's Bayesian network accuracy model using Gibbs sampling. 


1.5 

Data 

We demonstrate ideas and examples throughout the paper using the Dolphin 
network dataset, an undirected social network of frequent associations be¬ 
tween 62 dolphins in a community living off Doubtful Sound, New Zealand 
(see Figure 1.1), as compiled by [19]. The results presented in this paper have 
been obtained using R version 3.1.3. To create, manipulate and visualise the 
observed network data y we can use the function network and plot in¬ 
cluded in the statnet suite of packages. 

y <- read.table("http://moreno.ss.uci.edu/dolphins.dat", 
skip = 130) 



y <- network(y, directed = FALSE) 
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plot (y, vertex.col = "blue") 



Tab. 1.1 Dolphin undirected network graph. 


1 . 5.1 

Bayesian inference for exponential random graph models 

Bayesian inference for ERGMs is based on the posterior distribution of 8 given 
the data y: 

rt%) = = slgMi rw (i.3) 

p(y) z( 6) p{y ) 

where p(y) is the evidence or marginal likelihood of y which is typically 
intractable. 

Standard MCMC methods such as the Metropolis-Hastings algorithm, can 
deal with posterior estimation as long as the target posterior density is known 
up to the model evidence p(y). Unfortunately in the ERGM context the pos¬ 
terior density p(9\y) of non-trivially small ERGMs includes two intractable 











normalising constants, the model evidence p(y) and z(0). For this reason, the 
ERGM posterior density is "doubly intractable" [21]. 

In order to carry out Bayesian inference for ERGMs, the Bergm package 
makes use of a combination of Bayesian algorithms and MCMC techniques 
[1,2], The approximate exchange algorithm circumvents the problem of com¬ 
puting the normalising constants of the ERGM likelihoods, while the use of 
multiple chains and efficient adaptive proposal strategies are able to speed up 
the computations and improve chain mixing quite significantly. 

The approximate exchange algorithm implemented by the bergm function 
can be summarised in the following way: 


For each chain, repeat until converge: 


1) generate 9' (using some proposal strategy) 


2) simulate s(y') from ERGM likelihood (using standard MCMC proce¬ 
dures such as [20]) 


3) update 0 —t O' with the log of the probability: 
min ^0, [0 - 9'] 1 [s(y') - s(y)] + log 

Let us consider the following three dimensional model including the num¬ 
ber of edges and two new specification statistics e.g.: geometrically weighted 
edgewise shared partners (gwesp) and geometrically weighted non-edgewise 
shared partners (gwesp) [14]: 

gwnsp = A EfcZi 2 11 - (1 - e~*°) k \ NEP k {y) 
gwesp = A Efcl 2 {1 - (1 - e ~*°) * } EP k (y) 

where the scale parameters cp v — (p u — 0.6, and EP k (y) and NEP k (y) are 
respectively the number of connected and non-connected pairs of nodes with 
exactly k common neighbours. 




I g 

using the MCMC algorithm described above. In this example we use the 
parallel adaptive direction sampling (ADS) procedure described in [1] for step 
1 and 1,200 iterations (main .iters) for each chain. We set the number of 
MCMC chains to 9 by using the argument nchains. The number of iterations 
used to simulate network statistics s(y') at step 2 is defined by the argument 
aux .iters and it is set to 3, 000. 

model <- y ~ edges + 

gwnsp(.6, fixed = TRUE) + 
gwesp(.6, fixed = TRUE) 


post <- bergm(model, 

main.iters = 1200, 
aux.iters = 3000, 
nchains = 9) 

bergm.output(post, lag = 200) 

The bergm. output function produces MCMC diagnostic plots (Fig¬ 
ure 1.2) and the estimated posterior means, standard deviations, and accep¬ 
tance rates for each of the 9 chains and for the aggregated overall chain. 

MCMC results for Model: 

y ~ edges + gwnsp(.6, fixed = TRUE) + gwesp(.6, fixed = TRUE) 


Posterior mean: 




thetal 

(edges) 

Chain 

i 

-2 

.3512134 

Chain 

2 

-2 

.3889219 

Chain 

3 

-2 

.3362841 

Chain 

4 

-2 

.5628317 

Chain 

5 

-2 

.3709133 

Chain 

6 

-2 

. 5407332 

Chain 

7 

-2 

.4301006 

Chain 

8 

-2 

.4523673 

Chain 

9 

-2 

.3681535 

Posterior 

■ sd: 




thetal 

(edges) 

Chain 

1 

0 . 

33244522 

Chain 

2 

0 . 

43199257 

Chain 

3 

0 . 

37344505 

Chain 

4 

0 . 

41110025 

Chain 

5 

0 . 

48867437 

Chain 

6 

0 . 

36796911 


theta2 (gwnsp.fixed.0.6) 
-0.1864153 
-0.1800818 
-0.1779192 
-0.1646549 
-0.1799828 
-0.1646283 
-0.1783869 
-0.1799679 
-0.1789971 


theta2 (gwnsp.fixed.0.6) 
0.03951240 
0.04669447 
0.04083082 
0.04962470 
0.05371030 
0.04055858 


theta3 (gwesp.fixed.0.6) 
0.7521076 
0.7515701 
0.7068975 
0.7898211 
0.7316151 
0.7798916 
0.7698418 
0.8017089 
0.7424549 


theta3 (gwesp.fixed.0.6) 
0.11308154 
0.11585996 
0.10625747 
0.11782669 
0.14904932 
0.13496058 
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Chain 

7 

0.42739511 

0.04311092 

0.14186529 

Chain 

8 

0.48943818 

0.05430663 

0.12666345 

Chain 

9 

0.38717484 

0.04531716 

0.12718701 



Acceptance rate: 



Chain 

1 

0.1316667 



Chain 

2 

0.1375000 



Chain 

3 

0.1158333 



Chain 

4 

0.1550000 



Chain 

5 

0.1475000 



Chain 

6 

0.1566667 



Chain 

7 

0.1700000 



Chain 

8 

0.1525000 



Chain 

9 

0.1500000 



Overall 

posterior density estimate: 





thetal (edges) theta2 (gwnsp 

.fixed.0.6) theta3 

(gwesp.fixed.0.6) 

Post. 

mean -2.4223910 

-0.17678157 

0.7584343 

Post. 

sd 

0.4222507 

0.04675703 

0.1296237 


Overall acceptance rate: 0.15 

In this example, we can observe a low density effect expressed by the neg¬ 
ative value of the posterior mean of the edge effect parameter ( 6 \) combined 
with the negative value of multiple connectivity ( 62 ) and positive value of 
transitivity parameter ( 03 ). 

1 . 5.2 

Bayesian inference for latent space models 

A fully Bayesian approach for latent space models allows the estimation of all 
the parameters and latent positions simultaneously e.g. via MCMC sampling 
or variational approximation. 

In this paragraph, we perform an empirical Bayesian analysis in order to 
compare different computational approaches for the visualisation and predic¬ 
tion properties of LSMs with and without clustering. To carry out this type of 
analysis we can use the following R packages: latentnet [16], VBLPCM [27] 
and lvm4net [7]. Their main features of these packages are shown in Ta¬ 
ble 1.3. 

The latentnet package uses Bayesian MCMC algorithms so the model 
estimation is computationally very expensive, and the times to estimate the 
model can become extremely large even for networks of the order of hun¬ 
dreds of nodes. For this reason the variational Bayes approach to estimate the 
latent space model and the latent position cluster model in order to make fea- 



MCMC output for Model: y ~ edges + gwnsp(tau, fixed = TRUE) + gwesp(tau, fixed = TRUE) 



Tab. 1.2 MCMC diagnostics for the overall chain. The 3 plot columns are: estimated marginal 
posterior densities (left), traces (center) and autocorrelation plots (right). 


Tab. 1.3 Comparison of the main features of the packages for latent space modeling 



Model 

ED SED 

Inference Method 

MCMC Variational 

Clustering 

latentnet 

/ 

X 

/ 

X 

/ 

VBLPCM 

/ 

X 

X 

y 

/ 

lvm4net 

X 

y 

X 

y 

X 


sible the modelling of larger networks [8,27], The basic idea of this method 










is to find a lower bound of the log-likelihood by introducing a variational 
posterior distribution q and maximize it [15]. The posterior probability of the 
unknown parameters (z, a) can be written in the following form: 

N 

P(z/*|y) = Cp(y\z,K)p(oc)Ylp(zi), 

i=i 

where C is the unknown normalising constant. In the VBLPCM package, a 
hierarchical prior structure is taken into consideration. 

In [8], the variational posterior q(z,a\y) is defined in the following way: 

N 

q{z,a\y) = q{pi)Ylq{zi), 
i=1 

where q(a) — A/"(£, i/’ 2 ) and q(zj) — A £). 

The idea of using the squared Euclidean distance in the LSM was proposed 
by [8] in order to have less approximation to be made in the variational esti¬ 
mation procedure. 

In the latentnet package, we use the function called ergmm to estimate 
the posterior distribution of the LSM parameters and latent positions. The 
argument d refers to the dimension of the latent space, which we set equal to 
2 to make the visualisation of the latent positions of the nodes easier. 

post.latentnet <- ergmm(y ~ euclidean(d = 2)) 

In the VBLPCM package, we can use the function called vblpcmf it to es¬ 
timate the posterior distribution of the LPCM by specifying the number of 
clusters. In order to estimate a LSM we consider one cluster by setting the 
argument G equal to 1. 

post.vblpcm < vblpcmfit(vblpcmstart(y, G = 1, d = 2)) 

It is important to notice that the variational maximisation algorithm is sub¬ 
ject of the risk of reaching local maximum. 

The package VBLPCM provides a special function called vblpcmstart to 
generate initial latent positions. This algorithm is based on the Fruchterman- 
Reingold method by default (argument START), but there is also the possibil¬ 
ity of using random values, geodesic distances or Graph Laplacian methods. 
In this function other model features such as sociality effects, and node co¬ 
variates can also be specified. 



In lvm4net we use the function called Ism to estimate the posterior distri¬ 
bution of the LSM parameters and latent positions using a variational inferen¬ 
tial approach. This function makes use of the Fruchterman-Reingold method 
to set the initial positions by default. Multi-start procedure can be imple¬ 
mented by changing the value associated to the argument ns tart and only 
the values reaching the maximum are stored. It is also possible to start from 
random initial positions by setting the argument randomZ equal to TRUE. 

From Table 1.4, we can see that the Ism function is much faster than the 
ergmm function. In this case, the squared Euclidean distance is used. 

post.lvm4net <- lsm(y[,], D = 2) 


Tab. 1.4 Timings in seconds to fit LSMs (no clustering, G = 1). 



Time in sec. 

latentnet 

111.20 

VBLPCM 

14.02 

lvm4net 

6.47 


The latent positions are invariant under rotation, reflection and transla¬ 
tions. For this reason we can match the rotations using the rotXtoY func¬ 
tion (included in lvm4net) in order to visualise and compare the latent posi¬ 
tions estimated by the three approaches using the plotY function (included 

in lvm4net). 

Z <- post.lvm4net$lsmEZ 

Zm <- rotXtoY(post.latentnet$mkl$Z,Z)$X 
Zv <- rotXtoY(post.vblpcm$V_z,Z)$X 

plotY(y[,], EZ = Zm, main = "latentnet") 
plotY(y[,], EZ = Zv, main = "VBLPCM") 
plotY (y[,l, EZ = Z, main = "lvm4net") 

In Figures 1.5, 1.6, 1.7 we can see the estimated latent positions obtained 
using the three packages. In this example, the visualisation results obtained 
from latentnet and lvm4net are similar even though the distance model 
adopted is different. 

Latent position cluster models (LPCMs) are latent space models which in¬ 
corporate a Gaussian mixture model structure for the latent positions of nodes 



latentnet 



Znl 

Tab. 1.5 Latent positions obtained by using the latentnet package. 


in the latent space in order to accommodate the clustering of nodes in the net¬ 
work. The latentnet and VBLPCM packages can be used to estimate latent 
position cluster models by fixing the number of clusters (G). For our toy ex¬ 
ample, we choose 2 clusters. 

post.latentnet.G2 < ergmm(y ~ euclidean(d =2, G = 2)) 

post.vblpcm.G2 <- vblpcmfit(vblpcmstart(y, G = 2, d = 2)) 

In Figures 1.8 and 1.9 we can see the latent positions and the clusters re¬ 
turned by the two packages. The two algorithms give very similar results as 
they find latent groups differing of just one node. 




VBLPCM 



Znl 

Tab. 1.6 Latent positions obtained by using the vblpcm package. 
Tab. 1.10 Timings in seconds to fit LPCM with two clusters (G = 2). 



Time in sec. 

latentnet 

86.27 

VBLPCM 

11.95 


From Table 1.10 it is possible to notice that the VBLPCM package is much 
faster than the latent net package. 

The latentnet package gives exact estimates as they are based on MCMC 
simulations from the posterior distribution. However it only allows to deal 
with small networks whereas the approximate approaches of the VBLPCM and 
lvm4net packages are able to handle networks on thousands of nodes. 




Ivm4net 



Znl 

Tab. 1.7 Latent positions obtained by using the ivm4net package. 


1 . 5.3 

Predictive goodness-of-fit (GoF) diagnostics 

An important feature of the Bayesian approach is to make available proce¬ 
dures to establish whether the estimated parameter posterior of the model 
achieves a good fit to the key topological features of the observed network. 

The function included in the Bergm package provides a useful tool for 
assessing Bayesian goodness-of-fit so as to examine the fit of the data to the 
posterior model obtained by the bergm function. The observed network data 
are compared with a set of networks simulated from independent parameter 




MKL Latent Positions of post.latentnet.G2 

y ~ euclidean(d = 2, G = 2) 



Zi 

Tab. 1.8 Estimated latent positions from LPCM with 2 clusters obtained by using the 
latentnet package. 


values of the posterior density estimate. This comparison is made in terms of 
high-level network statistics not explicitly included in the model [13]. 

The R code below is used to compare some high level network statistics 
observed in the Dolphin network with a series of network statistics simu¬ 
lated from 100 random realisations of the estimated posterior distribution 
post. est using 10,000 auxiliary iterations for the network simulation step. 
The bgof function included in the Bergm package returns the plots shown in 
Figure 1.11. 

bgof(post, 

n.deg = 20, 
n.dist = 15, 
n.esp = 15) 

In Figure 1.11 we see, based on the various GoF statistics, that the networks 
simulated from the posterior distribution are in reasonable agreement with 
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Variational-Bayes Positions 



Tab. 1.9 Estimated latent positions from LPCM with 2 clusters obtained by using the vblpcm 
package. 


the observed network. We can therefore conclude that the model is a reason¬ 
able fit to the data, despite its simplicity. 

In the LSM context, it is possible to use the gof function included in 
latentnet and VBLPCM and the gof Ism function included in the lvm4net 
package to perform posterior GoF diagnostics. The GOF argument can be used 
to set the types of GoF statistics we want to analyse. Figures 1.12, 1.13, and 
1.14 display the GoF plots. 

gf.latentnet <- gof(post.latentnet, 

GOF = ~ degree + esp + distance) 

plot(gf.latentnet) 

gf.vblpcm <- gof(post.vblpcm, 

GOF = ~ degree + esp + distance) 


plot(gf.vblpcm) 
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degree minimum geodesic distance edge-wise shared partners 

Tab. 1.11 GoF diagnostics for ERGM (Bergm package): The red line displays the goodness 
of fit statistics for the observed data together with boxplots of GoF network statistics based on 
100 simulated networks from the posterior distribution. 


gf.lvm4net <- gofIsm(post.Ivm4net, 

Y = y [, ] , 

stats = c("degree", "esp", "distance"), 
doplot = FALSE) 

plot (gf.Ivm4net) 

The GoF analysis indicates that the LSM estimated by using the varia¬ 
tional approximation with squared Euclidean distance implemented in the 
lvm4net package displays a better fit of the model to the data compared to 
the other two approaches. 

To display the GoF diagnostics for the LPCMs estimated above, we can use 
the same R functions. 

gf.latentnet.G2 <- gof(post.latentnet.G2, 

GOF = ~ degree + esp + distance) 

plot(gf.latentnet.G2) 

gf.vblpcm.G2 <- gof(post.vblpcm.G2, 

GOF = ~ degree + esp + distance) 
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degree 


edge-wise shared partners minimum geodesic distance 


Tab. 1.12 GoF diagnostics for LSM (latentnet package): The solid black line displays the 
goodness of fit statistics for the observed data together with boxplots of GoF network statistics 
based on 100 simulated networks from the posterior distribution. 





02468 10 13 02468 10 13579 11 

degree edge-wise shared partners minimum geodesic distance 

Tab. 1.13 GoF diagnostics for LSM (vblpcm package): The solid black line displays the 
goodness of fit statistics for the observed data together with boxplots of GoF network statistics 
based on 100 simulated networks from the posterior distribution. 





plot(gf.vblpcm.G2) 

From Figures 1.15 and 1.16 we can see that the VBLPCM package has a better 
fit to the data in terms of edgewise shared partners distributions compared to 
the latetnet package. For this example, the inclusion of 2 clusters does not 













degree 


edge-wise shared partners 


minimum geodesic distance 


Tab. 1.14 GoF diagnostics for LSM (ivm4net package): The red line displays the goodness 
of fit statistics for the observed data together with boxplots of GoF network statistics based on 
100 simulated networks from the posterior distribution. 




Q. 



edge-wise shared partners 


minimum geodesic distance 


Tab. 1.15 GoF diagnostics for LPCM with 2 clusters (latentnet package): The solid black 
line displays the goodness of fit statistics for the observed data together with boxplots of GoF 
network statistics based on 100 simulated networks from the posterior distribution. 


seem to produce a significant improvement in terms of GoF with respect to 
the LSM without clustering. 
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Tab. 1.16 GoF diagnostics for LPCM with 2 clusters (vblpcm package): The solid black line 
displays the goodness of fit statistics for the observed data together with boxplots of GoF 
network statistics based on 100 simulated networks from the posterior distribution. 


1.6 

Conclusions 

This chapter provided an overview of a number of social network models em¬ 
phasising the computational perspective. In fact, the most important issue as¬ 
sociated to statistical social network models is concerting their computational 
complexity which requires the development of efficient inferential algorithms 
and software able to deal with the increasing size of relational data available. 

In particular, we have presented some recent advanced Bayesian ap¬ 
proaches to parameter estimation of exponential random graph models and 
latent variable network models. We demonstrated that Bayesian inference is 
a very helpful and powerful approach allowing a formal treatment of uncer¬ 
tainty using the rules of probability 

We discussed how Bayesian parameter estimation for exponential random 
graph models and latent space models is a computationally intensive prob¬ 
lem that can be tackled using advanced MCMC and variational techniques. 
We illustrated the main capabilities of the Bergm, latentnet, VBLPCM and 
lvm4net packages for the open-source R software through a tutorial anal¬ 
ysis of a well-known social network dataset. For each modelling approach 
we have also considered a Bayesian graphical test of goodness of fit to assess 
whether or not a given parametric model is compatible with the observed 
network data. 
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to effectively capture heterogeneity and organise different sources of informa¬ 
tion commonly available in social network data. For this reason, we believe 
statistical social network analysis will became fertile ground for interdisci¬ 
plinary research in advanced statistics and social network analysis applica¬ 
tions. 
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